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Abstract 

We extend the efFective fragment molecular orbital method (EFMO) into treating fragments connected 
by covalent bonds. The accuracy of EFMO is compared to FMO and conventional ab initio electronic 
^ , structure methods for polypeptides including proteins. Errors in energy for RHF and MP2 are within 2 

(— I ■ kcal/mol for neutral polypeptides and 6 kcal/mol for charged polypeptides similar to FMO but obtained 
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two to five times faster. For proteins, the errors are also within a few kcal/mol of the FMO results. We 
developed both the RHF and MP2 gradient for EFMO. Compared to ab initio, the EFMO optimized 
structures had an RMSD of 0.40 and 0.44 Afor RHF and MP2, respectively. 

Introduction 

The need to study very large systems in an efficient manner has led to the development of many compu- 
tational schemes trying to cope with the limitation in computational resources. Linear (or nearly linear) 
scaling methods have long been of particular interest because they allow, within their respective frame- 
work [THTT|. large systems to be treated by quantum mechanics. In particular, the use of fragments [T^lfT?] 
is very attractive for doing calculations of large systems. 

Recently, we developed the effective fragment molecular orbital (EFMO) method [T4|, which builds 
upon the fragment molecular orbital (FMO) method |15f[20] . and combines it with effective fragment 
potentials (EFF) [2Tti23]. EFMO is different from EFF, FMO and FMO/EFF [2l[25] in several ways. 
For instance, the EFPs are computed on-the-fiy from gas phase FMO fragment calculations and used for 
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classical interactions of separated dimcrs and many-body effects. Extending the earlier work |14| limited 
to molecular clusters at the RHF level, we now present the methodology to treat fragments connected by 
covalcnt bonds at the MP2 level. 

This article is organized as follows. First, we briefly outline the theoretical background of EFMO. We 
proceed to discuss the change in methodology needed to include fragmentation across covalent bonds in 
EFMO, including an overview of how fragment bonds are treated. The addition of correlation in EFMO 
is also presented here. Second, we benchmark the EFMO energy against ab initio calculations on three 
different sets of polypeptides and compare to FMO. We apply our findings to proteins and protein like 
structures. The quality of the gradient together with timings are also presented here. Water clusters are 
also briefly revisited. Finally, we summarize our results and discuss future directions. 

Methods 

Theoretical Background 

In FMO, the total two-body (FM02) non-correlated energy of a system consisting of N fragments (also 
called monomers) is given as 

^FM02 ^Y^^I + Jl -Ej- Ej) (1) 

/ IJ 

Here Ej (Ejj) is the energy of monomer / (dimcr IJ) in the electrostatic potential (ESP) of the other 
— 1 (TV — 2) fragments. The momoners converge in the field of ESP, requiring self-consistent charge 
(see) iterations. Dimcrs converge in the field of ESP of the N — 2 monomers. 

The total non-correlated EFMO energy of a system of N fragments was shown in previous work to be 

N -Rj.J<-Rrcsdim -Rl,J>-Rrcsdim 

i?™ = E^?+ E {eIj - E^ - E'^, - Efp^) + E ^l:7+<?' (2) 

/ I J I J 

where Ej is the gas phase energy of monomer (or fragment) /. Ejj is the gas phase dimer energy of dimer 
IJ. The second sum in equation [2] is the pairwise correction to the monomer energy and only applies 
for dimcrs separated by a distance less than i?rosdim- EJ^^ and Ef^^ arc the classical pair polarization 
energy of dimer I J and the classical total polarization energy, respectively. The final sum over Efj is 
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the classical electrostatic interaction energy and applies to dimers separated by a distance greater than 
-Rrcsdim- The fragment separation distance Rj j was defined previously [Tl]. Since EFMO only involves 
gas phase energy (and gradient) evaluations, only one SCC iteration is required. 

In EFMO, the classical terms in the energy expression (equation O are calculated from expressions 
in the EFP pertubation expansion of the interaction energy [2T1[22]. Based on the converged fragment 
calculations, EFP parameters are derived on-the-fly completely automatically. 

The analytical gradient derived previously |14| is reformulated for fragments connected by covalent 
bonds, and also extended to MP2. 

Covalent Bonds 

For fragmentation across covalent bonds, no corrections to the basic equation of EFMO is needed. How- 
ever, the inclusion of fragmentation across bonds requires a change in the methodology. In this paper, 
we show how fragmentation is carried out on protein backbones, but given any chemically reasonable 
fragmentation points this methodology is transferable to other systems. 

In regular FMO, two different schemes of fragmentation is possible. Common to both is that one 
specifies pairs of atoms which defines fragment boundaries (Figure [1]). Each detached bond is made of 
a bond attached atom (BAA) and a bond detached atom (BDA). The latter donates an electron to the 
fragment containing the BAA. One scheme is the hybrid orbital projection (HOP) approach [16], which 
allows full variational treatment of molecular orbitals (MO) across the bond during the fragment SCF. 
The other is the adapted frozen orbital (AFO) method [5S1[27| which freezes the occupied orbital that 
describes the bond [55]. EFMO uses the latter method, and for completeness we include a discussion of 
this particular scheme in this work. 

In AFO, a model system around the BAA and BDA is constructed (Figure [5]). RHF calculations are 
carried out on this system, followed by an Edminston-Ruedenberg localization |29] . The occupied orbital 
which has the largest overlap with the BDA and BAA is identified as the special bond orbital (SBO) 
shown on Figure |31 This orbital, along with several virtual orbitals on the BDA is stored for later use in 
monomer and dimer SCF calculations. 

For polypeptides, which is the main focus of this study, there is one SBO per pair of BAA and BDA. 
This SBO is associated with the fragment that contains the BAA. After the computation of all model 
systems, monomer calculations are done, followed by a Foster-Boys localization, where the SBO is kept 
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frozen, i.e. not allowed to mix with the rest of the orbitals. This leads to a polarizablc point in the 
centroid of the SBO (Figure [3]), obtained from the model system across the bond (Figure j^])- We have 
thus successfully eliminated the need to manually parametrize the bonds between pairs of fragments. 

In the original formulation of EFMO, the electric field arising from a static multipole or induced 
dipole in fragment / is screened by a Tang-Toennis type expression 

k{R, a,P) = l- exp (l + v^l-Rp) (3) 

Here, a and /3 are the screening parameters associated with fragments / and J, respectively. The distance 
parameter R is the vector between an induced dipole in fragment / and any of the electric moments in 
fragment J. The above expression is also the default in EFP [5T1[52] with the parameters a = /3 = 0.6. 
Wc emphasize that the screening parameters are associated with fragments and not individual polarizable 
points. In this study, we will investigate the need to change these screening parameters for EFMO since 
the original tuning of a in EFP was done for molecular dimers only. 

Correlation 

The introduction of correlation energy in the EFMO method follows previous work in FMO j5Dl - l5^ . The 
total correlated energy of a system of N fragments is given as 

^^^EFMO^^COR^ (4) 

Here e''-''^^ is given as the sum of monomer correlation energies Ef'''^ and pairwise corrections, i.e. 

N fl.f,J<i?corad 

£;COR^^£;COR^ J2 (i;CpR _ £;COR _ £;COR) ^ (5) 
/ IJ 

where Efp^ is the correlation energy of dimer IJ. The distance parameter i?corsd determines whether 
or not correlation is included for a specific dimer. The value of the parameter is discussed in the com- 
putational methodology section below. Note that for the correlation energy any size-extensive post-HF 
scheme can be used. 
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Computational Methodology 

All ab initio and fragment calculations were carried out in a locally modified version of GAMESS |33j . 
EFMO was parallelized with the generalized distributed data interface [33]. In all calculations, the 
6-3IG(d) [55H57] basis set was employed throughout unless specified otherwise. In all the geometry 
optimizations, a convergence criterion of 5.0 • 10^'* Hartrce / Bohr was used. 

The ab initio MP2 calculations had their integral accuracy increased to 10""'^^ (ICUT=12 in $CON- 
TRL), SCF convergence criterion was raised from 10"^ to 10^'^ (C0NV=lE-7 in $SCF) and the MP2 
code by K. Ishimura et. al [35] with AO integral transformation threshold increased from 10^^ to 10^^'^ 
(CODE=IMS and CUT0FF=1E-I2 in $MP2) to match what is used in FMO. 

For FMO (and EFMO), the AFO scheme was used throughout with the default settings for bond 
definitions (LOCAL^RUEDNBRG in SCONTROL and RAFO(I)=I,f ,1 in $FMO). The parameters for 
the electrostatic treatment of dimers i?rcsdim and the threshold for the inclusion of correlation effects i?corsd 
were both set to 2.0 (RESDIM=2.0 RCORSD=2.0 in $FMO) unless otherwise specified. The distances 
are relative to the van-der-Waals radii of atoms (see ref [14] for details). All EFMO calculations used 
multipoles up to quadrupoles distributed on atoms only. The determination of the polarizability tensors 
in EFMO are obtained with a Foster-Boys localization of the molecular orbitals (LOCAL=BOYS in 
SCONTRL) by a pcrtubation expression [32] (by specifying the POLAPP keyword in $LOCAL). The 
EFMO calculation (IEFM0=1 in $FMO) automatically enforces the Foster-Boys localization for the 
polarizability tensors. The screening parameter for all fragments are set to 0.1 for fragments with and 
without the SBO (SCREEN(1)=0. 1,0.1 in $FMO), respectively unless specified otherwise. 

The following structures used in this study were taken from previous work by Fedorov ct. al. |30II32II40| 
This includes a-helices (a — (ALA)„) and /3-sheets {(3 — (ALA)„) of alanine, Chignolin (PDB code: 
lUAO) and the Trp-cagc (PDB code: 1L2Y). Correlation effects on molecular clusters is carried out by 
investigating the structures from our previous study jl4j . The crystal structure of the 42 residue protein 
Crambinc (PDB code: ICRN) is also included and protonated using the PDB2PQR tool [iTlH^ . 

The three polypeptides used in this study were constructed by selecting six neutral (at pH = 7) 
amino acids AIVGLT (PI) and AVSNTL (P2) as well as four neutral and two non-neutral (at pH = 7) 
residues AVKNTD (P3) and padded with two glycine residues at each end for a total peptide length of 
10 residues. The polypeptides were protonated (at pH = 7) using the PDB2PQR tool. PI had neutral 
termini (arguments -neutralc -neutraln) while P2 and P3 both had charged termini. For each polypeptide. 
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a conformational search was carried out to locate twenty different structures using the ObConformer tool 
of the Open Babel package [ISHU] . They were finally minimized using PM6 gS] in MOPAC [35] with a 
bulk solvent (EPS=:80.1). 

Only results for two residues per fragment are discussed in detail below, and the results for one residue 
per fragment are shown in supplimentary materials. We note that because of the large charge transfer in 
some charged systems the one residue per fragment division leads to very considerable errors. 

When interpreting the accuracy of the results, the following quantities of errors are defined for energies. 
The error in energy 

^^A/,X ^^M_^X^ (6) 

the average deviation of conformers 

^E'jf = ^f:{Er-En (7) 

and the mean average deviation (MAD) for conformers 

Here, M is FM02/H0P, FM02/AF0 or EFMO and X is RHF or MP2. / runs through N conformers 
of polypeptides. To evaluate the quality of the EFMO gradient, numerical gradients (Vi?™™) were 
calculated on a-(ALA)io and compared to its analytical counterpart (VE^''^^) by the root mean square 
(rms) deviation of the individual elements 



\ 3Na ^ ' 

and the maximum deviation 

y£;max ^ ( | V^S^"^ - V,£;'™'"|) . (10) 

Na in equation |9] is the number of atoms in the molecule of interest, i in equation 1101 runs through 3Na 
atoms. 
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Results and Discussion 
Application to Polypeptides 

The performance of EFMO has a critical dependence on the screening parameter (equation [3]) because 
of the close position of a) induced dipoles located at the centroid of the SBO in one fragment and b) 
the nearby electrostatic moments and induced dipoles in another (especially, adjacent) fragment. In the 
following, the screening parameter for all fragments is a = 0.1 unless otherwise specified. 

FigurelHshows the MAD results obtained for two residues per fragment for all three polypeptides (PI, 
P2 and P3) using FM02/H0P, FM02/AF0 and EFMO for both RHF and MP2. For PI, RHF MAD 
values are 0.82 kcal/mol, 0.94 kcal/mol and 2.02 kcal/mol for FM02/H0P, FM02/AF0 and EFMO, 
respectively. The MP2 results yield 1.01 kcal/mol, 1.45 kcal/mol and 2.33 kcal/mol for PI respectively. 

For the charged polypeptide P2, MAD (Figure 01) increases by roughly a factor of two. The factor 
is about 3 for P3 (from 2.02 kcal/mol to 5.94 kcal/mol for the RHF energy). The inclusion of charged 
residues results in larger induced dipoles, which has a negative impact on the accuracy of the energy in 
EFMO. The accuracy of charged systems may be ameliorated by solvent screening. [47H49| . 

If one considers the average deviation (equations [5] and [7]) instead, it is interesting to note that EFMO 
compares well with FM02, and the agreement for P3 is perhaps fortuitous (the error is less than 0.5 
kcal/mol for EFM0-MP2). The maximum deviations for EFMO, however, are larger in all cases by 
roughly a factor of two. 

Application to Proteins 

The above benchmark of EFMO serves as an initial probe for how the energy behaves for polypeptides 
as the number of residues per fragment and screening parameters change. Based on those tests, we now 
apply EFMO to proteins or protein-like structures. The alanine polypeptides are particularly good for 
studying any systematic error, albeit they are not a representative benchmark for real proteins. 

In Table [TJ deviations in EFMO energy of the various protein structures compared to ab initio RHF 
(MP2) are presented for two residues per fragment with cutoffs i?rosdim and i?corsd both equal to 2.0. For 
Chignohn (lUAO), the deviation in energy for EFMO (equation i]) in RHF (MP2) energy is 1.79 (1.48) 
kcal/mol, and for FM02/AF0 it is 0.37 (1.38) kcal/mol. For the larger Trp-cage (1L2Y), the EFMO 
errors are -2.87 (-4.21) kcal/mol and for FM02/AF0 the values are 1.74 (6.35) kcal/mol. The Crambine 
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protein (ICRN) had errors of 15.66 (26.23) kcal/mol for EFMO, which is comparable to the FM02/AF0 
resuhs of 3.45 (25.59) kcal/moL EFMO shows errors of similar magnitude to FM02/AF0. 

The results from the a-heliccs and /3-sheets arc somewhat more detrimental. For a-helices, the error 
in energy increase with system size from 2.94 (0.32) kcal/mol for a— (ALA)io to 0.18 (-18.94) kcal/mol for 
the large a — (ALA)4o helix, which corresponds to an average error per residue of 0.29 (0.03) kcal/mol for 
a — (ALA)io and 0.00 (-0.47) kcal/mol for a — (ALA)4o. The a-helices tend to illustrate the case of over- 
polarization. For OL — (ALA)io, the total polarization energy is small (-12.89 kcal/mol) but as the system 
system size increase, so docs the total polarization energy (-73.81 kcal/mol) in a non-linear fashion. We 
note that the MP2 energy for a — (ALA)2o and a — (ALA)4o increases linearly with system size but the 
RHF energy does not. The over polarization is also observed for FM02/AF0, although the MP2 energies 
are much better (below 2 kcal/mol) which can only be attributed a better wave funtion of the individual 
fragments and their pairs. The /3-sheets have errors which are lower than in the a-alanines the errors are 
from 0.60 (-5.00) kcal/mol to 4.05 (6.46) kcal/mol for ^-(ALA)io and /3-(ALA)4o, respectively. Overall, 
the average error per residue becomes 0.06 (-0.50) kcal/mol and 0.10 (0.16) kcal/mol for /3 — (ALA)io 
and /3 — (ALA)4o, respectively. The /3-sheets are planar and not prone to the same over-polarization (the 
/3 — (ALA)4o has a polarization energy of around 50 kcal/mol). 

As noted above, the a-helices and /3-sheets illustrate two very different polypeptides. The inaccuracy 
of EFMO for them is somewhat alleviated by the fact that the errors in energy for Chignolin and the 
Trp-cage proteins are smaller than the a-helices and /3-sheets. The Trp-cage has 20 residues and its error 
in energy of -2.87 (-4.21) kcal/mol lie around the corresponding a-heliccs and /3-shcets of the same size 
-2.75 (-9.66) kcal/mol to 1.74 (2.78) kcal/mol, respectively. The same is true for Chignolin. 

Gradients and Geometry Optimizations 

A key strength of EFMO over other similar methods [THTT] is the availability of the gradient. The gradient 
of FM02/AF0 has been investigated previously for zeolites [57] where errors in gradient were found to 
be \1E™^: 0.2 ■ 10~^ Hartrce/Bohr and Vi?™^'': 1.4 • 10~^ Hartrce/Bohr when compared to numerical 
derivatives (equations IHl and llOp although with a smaller basis set than in this study. It was found, that 
even though these deviations were present, geometry optimizations did result in satisfactory structures. 

In this study, we present an investigation of the EFMO gradient comparing numerical and analytical 
values for proteins (Table[2]). It has roughly the same accuracy-related issues found for zeolites, specifically 
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around the bond regions where rms and maximum errors for FM02-RHF/AF0 with and without the 
electrostatic potential is 0.51 • 10^^ Hartree/Bohr, 3.43 ■ 10^'^ Hartree/Bohr and 0.76 • 10^'^ Hartree/Bohr 
and 4.71 • 10"^ Hartree/Bohr, respectively which is on par with what was found for zeolites. The latter 
result is particularly interesting as it is the FM02/AF0 result on top of which we add the EFP terms to 
obtain EFMO (equation [2]) . 

Several different approaches to tackle the gradient were attempted. The first is the original approach 
taken for molecular clusters which is to transfer the gradient terms of the induced dipoles /i'"^ to the 
nearest atom only, in this study named EFMOoig- This is a clear improvement over the FM02/AF0 
(without the ESP) result (Vi;"°^ 0.73 • 10"^ Hartree/Bohr, V-B'"^'': 3.50 • 10"^ Hartree/Bohr), but 
some deviations in gradient get worse using EFMO and will be discussed further below. Removing all 
torque contributions (EFMOnt) reveals further improvements (Vi?'''"*': 0.68-10^'^ Hartree/Bohr, ViS™'*'^: 
3.30 • lO"'^ Hartree/Bohr). Another approach, specifically for the induced dipole (EFMOnt+pct) is to do a 
percentage based distribution of the induced dipoles based on the distance between two atoms. This only 
applies if the induced dipole is between two atoms and the gradient is distributed based on a percentage 
of the entire bond length. This further improves the results, but the improvement {\7E™^: 0.66 • 10^^ 
Hartree/Bohr, V^;'"'''': 3.27 ■ 10^^ Hartree/Bohr) reveals that the main source of the error is not due 
to EFMO (Figure [5]), but pertains to approximations in the FM02/AF0 gradient. To make sure that 
the induced dipoles do not cause major problems, an approach was tried to not evaluate the electric 
field from the static multipole moments and the induced dipoles, both in the energy and the gradient, of 
adjacent fragments, that is fragment / covalently bound to fragment J does not induce dipoles in / and 
vice versa. Results with (EFMOnt+pct+adj) and without (EFMOnt+adj) percentage based distribution of 
induced dipoles are {VE™'': 0.66 • 10"^ Hartree/Bohr, VE"''''': 3.73 • 10"^ Hartree/Bohr) and {VE™"": 
0.66 • 10^3 Hartree/Bohr, V£;'°^^ 3.74 • lO'^ Hartree/Bohr) offer no clear advantage over EFMO„t+pct 
on the RHF level of theory, and consequently MP2 data are not presented. 

From Figure[ni it is clear that EFMO fixes some of the issues that FM02/AF0 has, but evidently cre- 
ates a few new ones at atom indices 111 (backbone nitrogen), 155 (backbone carboxylate) , 231 (backbone 
nitrogen) and 236 (backbone Cq,). Common to all is that it is around the bonding region. Evidently, 
small perturbations in the geometry, specifically around the bonding region, has large implications for 
the generated EFP parameters. For FM02-MP2/AFO and EFM0-MP2 (Figure [7] and Tabled, the 
errors in the gradient decrease for the EFMOnt+pct methodology {\7E™^: 0.61 • 10^^ Hartree/Bohr, 
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y^max. 2 . iQ 3 fJartree/Bohr) while FM02-MP2/AFO errors are very similar to the corresponding 
RHF values. 

Finally, geometry optimizations were carried out for q:-(ALA)io using the 6-31G(d) basis set and the 
EFMOnt+pct procedure. Figure [8] shows the improvement in energy as a function of the number of steps 
taken in a geometry optimization. The obtained optimized structures have the lowest energies when 
comparing to all the taken steps, even for one residue per fragment. Compared to RHF (MP2) optimized 
structures, the rms between the optimized structures are 0.40 (0.44) angstrom (EFMO with one residue 
per fragment did slightly worse). This can be compared to the 0.3 angstrom that was obtained for 
FM02-RHF with HOP previously [40]. 

EFMO offers a gradient whose quality is similar to FM02/AF0 calculations but at a reduced cost. 
The quality of the FM02/AF0 gradient could be improved if fully analytic derivatives available such 
as what was done by Nagata et. al. for HOP |50H52j . Another improvement can be obtained with an 
addition of the derivatives of the EFP monopoles (and higher order multipoles) as outlined by Xie et 
al. [5] We recommend EFMOnt+pct for geometry optimizations of polypeptides. 

Molecular Clusters 

Inclusion of correlation in EFMO (equation |4]) warrants a new benchmark of the water clusters that was 
used in the original EFMO paper. In Table [3l results for MP2 energies are shown for -Rrcsdim ~ ^corsd = 
2.0 for various basis sets. Since there are no covalent bonds, the screening parameter was given its original 
value of a = 0.6. In the original EFMO paper, the errors in energy for water clusters were discussed 
per hydrogen bond (HB) due to EFMO only describing higher order many-body effects for polarization 
(see ref [TJ for full details), thus, the error is a lack of many-body terms per HB. For EFM0-MP2, only 
monomer and ab initio dimers are considered correlated and the lack of treatment separated dimers gives 
rise to new errors but we expect these to be small. EFP does include dispersion terms [53] . but these are 
not included in this work. 

EFM0-MP2 does slightly worse (at 0.69 kcal/mol per HB) than FM02-MP2/AFO (-0.39 kcal/mol 
per HB) for the 6-31G(d) basis set as was observed for the RHF energies [13]. Increasing the basis set to 
6-31+G(d) shows that the EFMO error on average lowers to 0.02 kcal/mol per HB whereas the FM02 
error increases to an average of -0.71 kcal/mol per HB which is in line with what we observed at the 
RHF level of theory. Increasing the basis set further with diffuse functions reveals a slight improvement 
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for FM02 (-0.45 kcal/mol per HB) and negligible EFMO (-0.07 kcal/mol per HB) impact. 
Timings 

In our previous study [13], EFMO-RHF for molecular clusters were two (five) times faster than the 
corresponding FM02 energy (gradient) calculation. In Table 01 results for Chignolin and the Trp-cage 
are presented for 5 nodes using 2 cores per node. All timings were carried out on Intel Xeon X5550 
CPUs. Here, using EFM0-MP2 instead of EFMO-RHF increases the computation time by roughly a 
factor of two (from 14.0 minutes to 29.5 minutes for Chignolin using i?rcsdim = 2.0). For FM02, the 
same calculation takes 38.5 minutes and 58.6 minutes, respectively. An EFMO-RHF gradient evaluation 
for Chignolin takes only three minutes longer than the energy, but becomes a five-fold increase when 
running EFM0-MP2 gradients. The same trends are observed for the Trp-cage. We note a significant 
speedup when lowering the cutoff distances i?rosdim and i?corsd, especially for the larger Trp-cage. When 
the cut-off distances go down, the number of ab initio dimers decrease. Especially MP2 gradients require 
much CPU time due to the number of integrals that needs to be transformed [35] . 

Summary 

The effective fragment molecular orbital (EFMO) method is a merger of the effective fragment potential 
(EFP) method and the fragment molecular orbital (FMO) method and combines the general applicability 
of the FMO method (for example, to flexible biomolecules) with the speed of the EFP method. In this 
work, we have introduced new methodology needed to make EFMO work for systems with covalent bonds 
such as proteins. This, together with the analytical gradient provides an agile tool to treat proteins at 
a reasonable level of theory. We also showed how to incorporate electron correlation via M0ller-Plesset 
perturbation theory. 

We made an extensive study on small polypeptides to assess the need for screening when dealing with 
covalent bonds and found that an additional screening is needed compared to regular EFP. We showed 
that the deviations in energy on proteins are on par with FM02 to within a few kcal/mol when using 
two residues per fragment. For example, Chignolin is reproduced to within 0.1 kcal/mol compared to 
FM02. Timings were consistent with our previous work. We obtained two to five times speedup when 
using EFMO over FM02 for RHF. The speedup was somewhat lower when employing MP2 gradients, 
resulting in speedups between 1.6 and 2.3. 
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There are many ways in which the EFMO method can be improved and extended, for example, 
interfacing EFMO with the polarized continuum model (PCM) or the classical dispersion interaction in 
EFP |53| which would enable us to lower i?corsd compared to i?rcsdinn thus speeding up the evaluation 
of the gradient greatly. Another direction is to follow the multilayer FMO method [Sj and the recent 
frozen domain FMO (FMO/FD) method [55]. 

FMO has been applied [56H58] to a number of chemical problems, [59] and we expect that EFMO can 
be a useful method on its own, for example, in the structure optimization of protein-ligand complexes 
and other studies related to drug design. 
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Figures 




Figure 1. A model of a backbone in a protein. The model has side chains (Rl and R2) as well as 
the continuation of the backbone (R3 and R4). The bond attached atom (BAA) and the bond detached 
atom (BDA) face each other across the fragmentation point (marked with the yellow line). One 
fragment is shown within the yellow box. 
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Figure 2. The current model system used in this study for fragmentation across peptide 

bonds. The model is constructed automatically for use with AFO. The central atoms are the bond 
attached atom (BAA) and the bond detached atom (BDA). The atoms which arc connected directly to 
either the BAA or the BDA are included, capped with hydrogens as necessary. 
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Figure 3. Special bond orbital for bond 13 in The Trp-cage protein. The orbital is obtained 
using RHF/6-31G(d) on a model system (Figure [2]). 
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MAD Results, 2 Residues per Fragment, a=0.1 




PI P2 



Figure 4. Mean average deviations of FM02 and EFMO calculations. Results are compared 
to ab initio for conformers of the three polypeptides PI, P2 and P3 using two residues per fragment and 
the 6-31G(d) basis set. The screening parameter was set to a = 0.1 for all calculations. Energies in 
kcal/mol. 
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Figure 5. Average deviations of energy of FM02 and EFMO calculations compared to 

RHF and MP2. All the three polypeptides PI, P2 and P3 using two residues per fragment are shown. 
Labels on the figure represent the maximum observed deviation. The screening parameter was set to 
a = 0.1 for all calculations. Energies are in kcal/mol. 




Figure 6. Deviations of analytic gradient from the numeric gradient for RHF on 
a-(ALA)io. Shown in units of lO"^ Hartrcc/Bohr for FM02-RHF/AF0 and EFMO-RHF versus 
atomic coordinate for the 6-31G(d) basis set. 




Figure 7. Deviations of analytic gradient from the numeric gradient for MP2 on 

a- (ALA) 10. Shown in units of lO"^ Hartrcc/Bohr for FM02-MP2/AFO and EFM0-MP2 versus 

atomic coordinate for the 6-31G(d) basis set. 
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Figure 8. Convergence of energy as a function of number of geometry steps taken. Results 
are from an optimization of a-(ALA)io EFMO-RHF and EFM0-MP2 with both one and two residues 
per fragment calculated using the 6-31G(d) basis set. In all cases, the optimized geometries were 
optimized to a gradient threshold of 5.0 • 10^* Hartree/Bohr and all final structures had the lowest 
energies of all steps taken. 



27 



Tables 

Table 1. Energy Error compared to ab initio calculations on proteins and protein-like 
structures using two residues per fragment. 



EFMO FM02/AF0 





-^rcsdim 


= 2.0 


^rcsdim 


= 2.0 




RHP 


MP2 


RHF 


MP2 


a- (ALA) 10 


-2.94 


0.32 


-0.77 


-0.08 


/3-(ALA)io 


0.60 


0.89 


0.08 


0.25 


a-(ALA)2o 


-2.75 


-9.66 


-2.30 


-0.53 


/3-(ALA)2o 


1.74 


2.78 


0.22 


0.71 


a-(ALA)4o 


0.18 


-18.94 


-5.47 


-1.62 


/3-(ALA)4o 


4.05 


6.46 


0.51 


1.62 


Chignolin 


1.79 


1.48 


0.37 


1.38 


Trp-cage 


-2.87 


-4.27 


1.74 


6.35 


Crambine" 


15.66 


26.23 


3.45 


25.59 



"based on an FM03-MP2/6-31G(d) calculation. 



We used the 6-31G(d) basis set and i?rcsdim = -Rcorsd = 2.0. In all calculations, the screening parameter 
a was kept fixed at a value of a = 0.1. All units in kcal/mol. 
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Table 2. Errors in gradient of EFMO and FM02/AF0 for the q;-(ALA)io polypeptide. 





FM02 


FM02° 


EFMOorg 


EFMOnt 


EFMOnt+pct 


EFMOnt+adj 


EFMOnat+pct+adj 


RHF 


















0.51 


0.76 


0.73 


0.68 


0.66 


0.66 


0.66 




3.43 


4.71 


3.50 


3.30 


3.27 


3.73 


3.74 


MP2 
















V £;™'" 


0.70 


0.75 


0.69 


1.20 


0.61 








3.57 


4.53 


2.84 


2.91 


2.89 







" No ESP. 



Both RHF/6-31G(d) and MP2/6-31G(d) levels of theory are evaluated. All units in lO'^ Hartree/Bohr. 
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Table 3. Water cluster energy error for EFMO and FM02 relative to ab initio MP2 (in 
kcal/mol per hydrogen bond). 







EFMO 


FM02 


6-31G(d) 










31 


0.63 


-0.43 


20 


32 


0.66 


-0.37 




29 


0.78 


-0.38 


6-31+G(d) 










31 


0.02 


-0.69 


20 


32 


0.01 


-0.67 




29 


0.02 


-0.76 


6-31++G(d) 










31 


-0.05 


-0.44 


20 


32 


-0.04 


-0.43 




29 


-0.05 


-0.48 


6-31G(d) 








30 


51 


0.59 


-0.43 


40 


63 


0.79 


-0.41 


50 


86 


0.74 


-0.45 



Energies are calculated using the 6-31G(d), 6-31+G(d) and 6-31++G(d) basis sets. In all calculations 

-Rresdim = i?corsd =2.0 and Q; = 0.6 
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Table 4. Timings for FM02 and EFMO energy and gradient calculations on the Trp-cage 
protein. 





^rcsdirriT -f^corsd 


T(RHF) 


T(VRHF) 


r(MP2) 


r(VMP2) 






Chignolin 






EFMO 


1.0 


9.6 


11.1 


22.8 


102.8 




1.5 


13.2 


13.1 


28.7 


106.4 




2.0 


14.0 


17.0 


29.5 


119.0 


FM02 


2.0 


38.5 


59.7 


58.6 


149.9= 






Trp-cage 






EFMO 


1.0 


24.2'' 


23.5 


42.7 


161.0 




1.5 


33.7 


38.3 


70.7 


261.6 




2.0 


37.6 


43.0 


78.9 


314.0 


FM02 


2.0 


100.4 


187.0 


142.5 


408.6^^ 



" tested for both RHF/6-31G(d) and MP2/6-31G(d). All units in minutes. 
CPU utilization was ''96%, ^=85% and ''91%. All other were 99%. 



All timings were carried out on 5 nodes containing Intel Xeon X5550 CPUs (10 CPU cores total). 



